Because distributions are so important to biostatistics, it’s a good practice to prepare a histogram for

every numerical variable you plan to analyze. That way, you can see whether it’s noticeably skewed

and, if so, whether a logarithmic transformation makes the distribution normal enough so you can use

statistics intended for normal distributions on your data.

If you can’t find any transformation that makes your data look even approximately normal, then you

have to analyze your data using nonparametric methods, which don’t assume that your data are

normally distributed.

Summarizing grouped data with bars, boxes, and whiskers

Sometimes you want to show how a numerical variable differs from one group of participants to

another. For example, blood levels of a certain cardiovascular enzyme vary among the cardiology

patients at four different clinics: Clinic A, B, C, and D. Two types of graphs are commonly used for

this purpose: bar charts and box-and-whiskers plots.

Bar charts

One simple way to display and compare the means of several groups of data is with a bar chart, like

the one shown in Figure 9-7a. Here, the bar height for each group of patients equals the mean (or

median, or geometric mean) value of the enzyme level for patients at the clinic represented by the bar.

And the bar chart becomes even more informative if you indicate the spread of values for each clinical

sample by placing lines representing one SD above and below the tops of the bars, as shown in Figure

9-7b. These lines are always referred to as error bars, which is an unfortunate choice of words that

can cause confusion when error bars are added to a bar chart. In this case, error refers to statistical

error (described in Chapter 6).

© John Wiley & Sons, Inc.

FIGURE 9-7: Bar charts showing mean values (a) and standard deviations (b).

But even with error bars, a bar chart still doesn’t provide a picture of the distribution of enzyme

levels within each group. Are the values skewed? Are there outliers? Imagine that you made a

histogram for each subgroup of patients — Clinic A, Clinic B, Clinic C, and Clinic D. But if you think

about it, four histograms would take up a lot of space. There is a solution for this! Keep reading to find

out what it is.

Box-and-whiskers charts

The box-and-whiskers plot (or B&W, or just box plot) plot uses very little space to display a lot of

information about the distribution of numbers in one or more groups of participants. A box plot of the